Approximation Techniques to Enable Dimensionality Reduction for Voronoi-Based Nearest Neighbor Search

نویسندگان

  • Christoph Brochhaus
  • Marc Wichterich
  • Thomas Seidl
چکیده

Utilizing spatial index structures on secondary memory for nearest neighbor search in high-dimensional data spaces has been the subject of much research. With the potential to host larger indexes in main memory, applications demanding a high query throughput stand to benefit from index structures tailored for that environment. “Index once, query at very high frequency” scenarios on semi-static data require particularly fast responses while allowing for more extensive precalculations. One such precalculation consists of indexing the solution space for nearest neighbor queries as used by the approximate Voronoi cell-based method. A major deficiency of this promising approach is the lack of a way to incorporate effective dimensionality reduction techniques. We propose methods to overcome the difficulties faced for normalized data and present a second reduction step that improves response times through limiting the dimensionality of the Voronoi cell approximations. In addition, we evaluate the suitability of our approach for main memory indexing where speedup factors of up to five can be observed for real world data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Voronoi Projection-Based Fast Nearest-Neighbor Search Algorithms: Box-Search and Mapping Table-Based Search Techniques

In this paper we consider fast nearest-neighbor search techniques based on the projections of Voronoi regions. The Voronoi diagram of a given set of points provides an implicit geometric interpretation of nearest-neighbor search and serves as an important basis for several proximity search algorithms in computational geometry and in developing structure-based fast vector quantization techniques...

متن کامل

Towards Optimal ǫ-Approximate Nearest Neighbor Algorithms in Constant Dimensions

The nearest neighbor (NN) problem on a set of n points P , is to build a data structure that when given a query point q, finds p ∈ P such that for all p′ ∈ P , d(p, q) ≤ d(p′, q). In low dimensions (2 or 3), this is considered a solved problem, with techniques such as Voronoi diagrams providing practical, log-height tree structures. Finding algorithms that work in arbitrary dimension has been m...

متن کامل

Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space

ÐSimilarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of highdimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor se...

متن کامل

Fast Nearest-Neighbor Search Algorithms Based on High-Multidimensional Data

Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore pre-compute the result of any nearest-neighbor s...

متن کامل

Metric-Based Shape Retrieval in Large Databases

This paper examines the problem of database organization and retrieval based on computing metric pairwise distances. A low-dimensional Euclidean approximation of a high-dimensional metric space is not efficient, while search in a high-dimensional Euclidean space suffers from the “curse of dimensionality”. Thus, techniques designed for searching metric spaces must be used. We evaluate several su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006